ai-chat: architecture cleanup, bug fixes, storage guards, new features, docs rewrite by threepointone · Pull Request #899 · cloudflare/agents

threepointone · 2026-02-13T00:24:07Z

Description

TL;DR

Modularized AIChatAgent -- extracted stream resumption, WebSocket transport, and message building into focused modules
Fixed 12 bugs including setMessages data loss, stream resumption race condition (Stream resumption loses reasoning/thinking state after page refresh #896), SQLite crashes on large tool outputs, and tool continuation hangs on error
Added maxPersistedMessages, body option, incremental persistence, and automatic row size compaction
Made onFinish optional -- framework handles cleanup automatically
Rewrote all docs, added 155 tests (unit + React + Playwright e2e), switched examples to Workers AI
Memoized agent.stub in useAgent so it is referentially stable across renders

All changes are backward compatible. No public API was removed.

Motivation

Users were hitting crashes from SQLite's 2MB row limit on large tool outputs, stream resumption had a race condition where reconnecting clients missed the resume notification (#896), and setMessages with a functional updater silently lost data. These bugs were difficult to isolate because the codebase had no separation of concerns -- index.ts was 2,286 lines with stream resumption, chunk buffering, SSE parsing, tool handling, message persistence, and the WebSocket protocol all interleaved in one class.

On the client side, useAgentChat created a fake fetch callback that assembled a Response object from WebSocket messages, then fed it into the AI SDK's DefaultChatTransport which re-parsed the SSE. Every chunk was serialized, deserialized, re-serialized, and re-deserialized.

Deprecated APIs from the v4/v5 era were mixed in with current code, with no deprecation warnings or clear boundaries. Test coverage was minimal -- no e2e tests, limited unit tests, no React hook tests. Documentation showed outdated patterns that no longer worked correctly with AI SDK v6.

What changed

Architecture (no behavior change)

Extracted three focused modules from the monolithic AIChatAgent class:

New module	Responsibility	Lines
`resumable-stream.ts`	Chunk buffering, SQLite persistence, replay, cleanup	440
`ws-chat-transport.ts`	Native WebSocket `ChatTransport<UIMessage>` for the AI SDK	203
`message-builder.ts`	Reconstruct `UIMessage` parts from stream chunks (shared server/client)	302

index.ts went from 2,286 to ~1,760 lines. react.tsx went from 1,461 to ~1,350 lines. The total line count is similar but the code is now modular and testable.

Bug fixes (12)

setMessages functional updater data loss -- setMessages(prev => [...prev, msg]) sent an empty array to the server because the wrapper did not resolve the function before syncing.
_sendPlaintextReply creating multiple text parts -- each network chunk became a separate text part in the message instead of accumulating into one.
Uncaught exception on empty request body -- JSON.parse(undefined) threw an uncaught SyntaxError when clients sent malformed requests.
CF_AGENT_MESSAGE_UPDATED not broadcast for streaming messages -- tool results applied during an active stream were silently swallowed instead of being broadcast to other connections.
Stream resumption race condition (Stream resumption loses reasoning/thinking state after page refresh #896) -- server sent CF_AGENT_STREAM_RESUMING in onConnect before the client's message handler was registered. Fixed with client-initiated CF_AGENT_STREAM_RESUME_REQUEST protocol and replay flag on buffered chunks.
_streamCompletionPromise not resolved on error -- if a stream errored, the completion promise was never resolved, causing tool continuations waiting on it to hang forever. Moved cleanup into finally block.
body lost during tool continuations -- custom options.body data was not passed through to onChatMessage during auto-continuation. Now stored alongside _lastClientTools.
clearAll() not clearing chunk buffer -- if clearAll() was called while chunks were buffered in memory, the next flushBuffer() would write orphaned chunks to freshly-cleared tables.
Errored streams never cleaned up -- the garbage collector only cleaned status = 'completed' streams. Errored streams persisted indefinitely.
reasoning-delta dropping data without reasoning-start -- unlike text-delta which had a fallback, reasoning-delta silently discarded data if no matching reasoning part existed. Affects stream resumption where reasoning-start was missed.
Row size guard using string.length instead of byte count -- JSON.stringify().length measures UTF-16 code units, not UTF-8 bytes. Multi-byte Unicode (CJK, emoji) could exceed SQLite's 2MB byte limit while passing the character-length check. Now uses TextEncoder for accurate measurement.
Redundant cancel message on stream completion -- abort signal listener fired after the stream already completed normally, sending a spurious CF_AGENT_CHAT_REQUEST_CANCEL. Added completed flag guard.

New features

maxPersistedMessages -- cap SQLite message count. Oldest messages deleted after each persist. Default: unlimited (backward compatible).
body option on useAgentChat -- send custom data with every request. Accepts static objects or functions (sync/async). Available in onChatMessage via options.body.
Incremental persistence -- hash-based cache skips SQL writes for unchanged messages. Populated from SQLite on load, survives hibernation.
Row size guard -- automatic two-pass compaction when messages approach SQLite's 2MB row limit. Compacts tool outputs first, then text parts. Adds metadata (compactedToolOutputs/compactedTextParts) so clients can detect compaction.
Stream chunk guard -- ResumableStream skips storing chunks over 1.8MB (still broadcast to live clients, just not persisted for replay).
onFinish made optional -- abort controller cleanup and observability emit moved into the framework's stream completion handler:
```
async onChatMessage() {
  const result = streamText({ ... });
  return result.toUIMessageStreamResponse();
}
```

`agents` package fixes

agent.stub now referentially stable -- createStubProxy(call) was creating a new Proxy on every render, causing infinite loops when agent.stub was used in dependency arrays. Now memoized with useMemo.
cacheInvalidatedAt lint fix -- referenced inside useMemo body to make the cache-buster dependency explicit.
Widened @cloudflare/ai-chat peer dep -- "^0.0.8" to "^0.0.8 || ^0.1.0" to prevent changeset cascade to major bumps.

Deprecations

All deprecated APIs now emit a one-time console.warn on first use, have @deprecated JSDoc, and are marked with // -- DEPRECATED -- section banners. Removal planned for next major.

Server: createToolsFromClientSchemas()
Client: extractClientToolSchemas(), detectToolsRequiringConfirmation(), tools, toolsRequiringConfirmation, experimental_automaticToolResolution, autoSendAfterAllConfirmationsResolved, addToolResult()
Migration: migrateToUIMessage(), migrateMessagesToUIFormat(), needsMigration(), analyzeCorruption()

Docs

New docs/chat-agents.md -- comprehensive reference for AIChatAgent and useAgentChat (673 lines). Covers server API, client API, all three tool patterns, custom request data, resumable streaming, storage management, multiple AI providers, multi-client sync, and the WebSocket protocol.
Rewritten README.md -- correct patterns, Workers AI examples, no onFinish boilerplate.
Rewritten human-in-the-loop.md -- modern needsApproval + onToolCall patterns.
Rewritten client-tools-continuation.md -- autoContinueAfterToolResult with Workers AI.
Updated resumable-streaming.md -- accuracy fixes, new protocol details, back-links.
Tightened migration guides -- v5 guide: 392 -> 97 lines. v6 guide: 363 -> 148 lines.
Updated index.md -- moved resumable streaming to AI Integration section, removed TODO marker for chat-agents.

Examples

New examples/ai-chat/ -- showcases all recommended patterns: server tools, client tools (onToolCall), tool approval (needsApproval), pruneMessages, maxPersistedMessages, body option, Workers AI.
All examples and guides switched to Workers AI with @cf/zai-org/glm-4.7-flash (no API key needed).
examples/resumable-stream-chat/ -- updated to toUIMessageStreamResponse(), fixed CSS @source path.
guides/human-in-the-loop/ -- rewritten to modern patterns.
Lint fixes across examples/playground/ -- memoized callbacks to fix exhaustive-deps warnings.
Default model in site/ai-playground/ changed to @cf/zai-org/glm-4.7-flash.

Tests

Suite	Tests
Workers (vitest)	136
React (vitest-browser)	19
E2E (Playwright)	32
Total	155

Notable additions:

message-builder.test.ts (34 tests) -- full chunk type coverage including tool streaming lifecycle and stream resumption fallbacks
row-size-guard.test.ts (11 tests) -- incremental persistence, compaction, Unicode byte-length, chunk guard
max-persisted-messages.test.ts (5 tests) -- storage cap enforcement
onfinish-cleanup.test.ts (5 tests) -- abort controller cleanup without user onFinish
resumable-streaming.test.ts -- clearAll buffer clearing, errored stream cleanup
E2E: 3MB message compaction, multi-tab tool streaming, client tool round-trip with auto-continuation, stream resumption, LLM streaming with Workers AI
React: body option, re-render stability, clearHistory, onToolCall

Design decisions

Why extract modules instead of rewriting from scratch?

The existing behavior is battle-tested in production. A full rewrite would risk subtle regressions in the WebSocket protocol, hibernation recovery, and stream resumption. Instead, we extracted code into modules with clear interfaces while preserving the exact same behavior. Every extraction was verified by the new test suite.

Why a native WebSocket ChatTransport?

The old approach created a fake Response object from WebSocket messages so it could use the AI SDK's HTTP-based DefaultChatTransport. This meant every chunk was serialized to SSE, wrapped in a Response, then re-parsed from SSE. The new WebSocketChatTransport implements ChatTransport<UIMessage> directly, returning a ReadableStream<UIMessageChunk> from WebSocket events. No fake fetch, no double serialization.

Why compaction instead of splitting messages across rows?

SQLite rows have a 2MB hard limit. We considered splitting large messages across multiple rows, but this would have complicated every query, broken hibernation wake-up (which loads all messages), and created consistency risks. Instead, we compact in-place: large tool outputs are replaced with an LLM-friendly summary, and the metadata preserves what was compacted. The LLM still gets useful context ("this tool returned a large result that was compacted -- suggest re-running it"), and the UX degrades gracefully rather than crashing.

Why not remove deprecated APIs now?

Users are actively using the v4/v5 patterns (tools, toolsRequiringConfirmation, experimental_automaticToolResolution). Removing them would be a breaking change. Instead, we added console.warn on first use, @deprecated JSDoc, and clear section banners. The migration path is documented in migration-to-ai-sdk-v6.md. Removal happens in the next major.

Why keep `onFinish` in the signature at all?

Even though framework cleanup is now automatic, users may still want onFinish for their own logic (logging, analytics, side effects). Making it optional rather than removing it preserves that escape hatch without forcing everyone to use it.

Why one large PR instead of a series of smaller ones?

The architecture extraction, bug fixes, and new features are interconnected. For example, the row size guard depends on incremental persistence, which depends on the extracted ResumableStream class. The stream resumption race fix touches both index.ts and react.tsx. Splitting these into separate PRs would have created intermediate states where the code compiled but had subtle inconsistencies. The "Notes for reviewers" section below suggests a review order that makes the diff manageable.

Notes for reviewers

Suggested review order:

Start with the three new extracted modules -- they are self-contained and easy to review in isolation:
- packages/ai-chat/src/resumable-stream.ts -- stream chunk management
- packages/ai-chat/src/ws-chat-transport.ts -- WebSocket transport
- packages/ai-chat/src/message-builder.ts -- shared chunk-to-parts logic
Then review index.ts -- the diff is large but mostly deletions from the extraction above. The remaining new code is: incremental persistence cache, row size guard (with UTF-8 byte measurement), onFinish cleanup in finally, body preservation for continuations, and tool continuation error handling.
Then review react.tsx -- changes are in distinct sections:
- WebSocketChatTransport usage (replaces aiFetch)
- body option merging in prepareBody
- toolsRequiringConfirmation memoized with useMemo
- onToolCall type fix (omit from UseChatParams to avoid union)
- Deprecation warnings gated behind option usage checks
Then packages/agents/src/react.tsx -- small change: agent.stub memoized with useMemo, cacheInvalidatedAt referenced in memo body.
Then docs, examples, tests -- these are straightforward to skim.

Other notes:

The e2e tests use Workers AI (@cf/zai-org/glm-4.7-flash) -- no API key needed. The only OpenAI usage is BadKeyAgent which intentionally tests error handling with an invalid key.
All changes are backward compatible. No public API was removed. New features (maxPersistedMessages, body, incremental persistence, row size guard) are opt-in or automatic with no behavior change for existing users.
The playground lint fixes (ChatRoomsDemo, SupervisorDemo, SqlDemo, ScheduleDemo, useLogs) are unrelated to ai-chat but fix pre-existing exhaustive-deps warnings that showed up in npm run check.

Deferred

These were considered during the refactor but intentionally left for follow-up work:

Decompose useAgentChat into smaller hooks. The hook is ~900 lines with tool resolution, stream resumption, message sync, and transport setup all in one function. It should be split into composable hooks (useStreamResumption, useToolResolution, etc.), but doing so in this PR would have made the diff even larger and harder to review.
Remove deprecated APIs. The v4/v5 client tool patterns (tools, toolsRequiringConfirmation, experimental_automaticToolResolution, addToolResult, etc.) are still used by existing apps. This PR adds deprecation warnings and JSDoc; actual removal happens in the next major version.
Revisit constructor monkey-patching. AIChatAgent wraps lifecycle methods (onConnect, onClose, onMessage) in the constructor so users do not have to call super. This is a deliberate DX choice but makes the class harder to reason about. A middleware/hook pattern would be cleaner, but changing it is a breaking API change.
WebSocket-based initial messages. Currently, useAgentChat fetches initial messages via an HTTP GET /get-messages endpoint, which requires a Suspense boundary. Sending them over the existing WebSocket connection would eliminate the HTTP call and simplify the client setup, but the interaction with React Suspense (the socket unmounts during suspend) needs careful design.
N+1 deletes in _enforceMaxPersistedMessages. Each excess message is deleted individually. A single DELETE WHERE id IN (...) would be cleaner but the performance difference is negligible with local SQLite (no network latency).

changeset-bot · 2026-02-13T00:24:12Z

🦋 Changeset detected

Latest commit: 54bcc0f

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 2 packages

Name	Type
agents	Patch
@cloudflare/ai-chat	Minor

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

pkg-pr-new · 2026-02-13T00:26:26Z

Open in StackBlitz

npm i https://pkg.pr.new/cloudflare/agents@899

commit: 54bcc0f

Large refactor and feature update for the ai-chat package: extracted ResumableStream, simplified SSE parsing (message-builder), added WebSocket chat transport, and organized deprecated APIs. Adds many new tests (unit, react, and Playwright E2E), new examples (examples/ai-chat), and new docs (chat-agents.md plus updates to human-in-the-loop, client-tools-continuation, resumable-streaming, and more). Introduces features like incremental persistence, row-size guard, maxPersistedMessages, top-level body option for useAgentChat, and improved tool streaming/continuation flows. Misc: package.json updates and .gitignore entry for Playwright test-results.

threepointone · 2026-02-13T06:57:52Z

/bonk review this PR

Several fixes to make streaming more robust and update E2E test metadata: - e2e/llm.spec.ts: Update test description to reflect Workers AI usage (AI binding in wrangler.jsonc) and note that no OpenAI API key is required; BadKeyAgent remains for invalid-key error handling. - src/index.ts: Add TODO and comment explaining a race where the completion promise can be null and a 500ms fallback is used; suggests a more deterministic signal. - src/resumable-stream.ts: Use a local streamId snapshot and guard against it being undefined; replace the buggy self-comparison with a defensive check to detect if the active stream changed between lookup and replay. - src/ws-chat-transport.ts: Introduce a `completed` flag to avoid handling aborts after the stream finished; set `completed = true` on error and done to prevent spurious abort handling. These changes address race conditions around stream lifecycle, make intent clearer, and prevent duplicate/late abort handling.

threepointone · 2026-02-13T07:20:05Z

/bonk I addressed your review feedback, have another look

Delete pr.md containing the large PR draft and changelog notes. Removes an obsolete/draft documentation file from the repository.

Add accurate UTF-8 byte-length measurement and apply it to row/ chunk size guards to prevent SQLite row-limit issues. Preserve the latest custom request body for tool continuations and forward it to continuations. Harden streaming cleanup: always clear streaming references, resolve completion promises on errors, remove abort controllers, and emit observability. Improve resumable stream handling (skip oversized chunks using byte size, clear in-memory chunk buffer on clearAll, and include errored streams in periodic cleanup). Add a fallback to create reasoning parts when reasoning-delta arrives without a start. Update tests to use ws.close(1000), add tests for buffer clearing, errored-stream cleanup, and multi-byte Unicode byte-length behavior, and expose small TestChatAgent helpers for testing stream cleanup and insertion of old errored streams.

threepointone · 2026-02-13T08:07:40Z

/bonk I addressed your view feedback again, please review again

In packages/ai-chat/src/index.ts: add a cached TextEncoder (_encoder) and use it in _byteLength to avoid allocating a new encoder on every call (performance). Use this._lastBody as the request body instead of conditionally sending customBody. Adjust streaming cleanup to always remove the abort controller but only emit observability events when the stream completed successfully (prevents emitting on error paths). These changes reduce allocations and tighten observable emission semantics.

threepointone · 2026-02-13T15:46:10Z

/bonk I pushed some changes, do another review please

threepointone · 2026-02-13T15:51:37Z

/bonk I pushed some changes, do another review please

Add a module-level TextEncoder in packages/ai-chat/src/index.ts and packages/ai-chat/src/resumable-stream.ts and switch byte-length measurements to use it. Removes the AIChatAgent._encoder and avoids creating new TextEncoder instances on each measurement, reducing allocations when enforcing SQLite row and chunk size limits.

Pass explicit normal close code (1000) to ws.close() calls in packages/ai-chat/src/tests/client-tools-reconnect.test.ts. This makes the WebSocket close behavior explicit in the tests and avoids ambiguous/overloaded calls during test teardown.

threepointone · 2026-02-13T18:05:41Z

Ok I've addressed all feedback, this is I bc pretty good shape. Once I fix the changeset problem, I'm going to merge this in

threepointone · 2026-02-13T18:28:33Z

This is throwing infinite loop useEffect in playground, will debug and fix later
edit: fixed, and found another bug I fixed with it

Replace occurrences of the Workers AI model '@cf/openai/gpt-oss-120b' with '@cf/zai-org/glm-4.7-flash' across docs, examples, guides, and e2e tests. Also bump .changeset schema to @changesets/config v3.1.2 and relax @cloudflare/ai-chat peerDependency in packages/agents to allow ^0.1.0. Adjust README and test comments to reflect the new model.

Memoize createStubProxy in useAgent so the agent.stub is referentially stable across renders (useMemo). Update default model for the AI playground (client and server) from @cf/qwen/qwen3-30b-a3b-fp8 to @cf/zai-org/glm-4.7-flash. Also update the changeset to list multiple bug fixes for AIChatAgent (stream resumption, error handling, chunk buffer clearing, UTF-8 row size guard, GC of errored streams, and related stream completion fixes).

threepointone · 2026-02-13T22:28:20Z

/bonk do another review

When a text-delta is received without a preceding text-start, create the fallback text part with state="streaming" and update tests to verify the streaming state and that a subsequent text-end marks it as "done". Also wrap the waitForStream() continuation flow in a .catch to log tool continuation failures and tidy the continuation reply flow (ensure continuation flag/chatMessageId are passed into _reply). These changes improve stream resumption handling and prevent unhandled promise rejections during tool continuations.

threepointone · 2026-02-14T07:54:10Z

/bonk one more review please.

Return the value (promise) from this._tryCatchChat(...) so the caller receives and can await the async operation. Previously the call wasn't returned which could drop the promise/return value and break propagation of the agent continuation/abort flow.

threepointone · 2026-02-14T08:07:38Z

/bonk one more review, tell me if this is ready to land

Co-authored-by: threepointone <threepointone@users.noreply.github.com>

ask-bonk · 2026-02-14T08:11:30Z

Here's my review:

Final Review (commit `93e1bbb`)

All prior review items: resolved

Every item flagged across the 4 previous review rounds is addressed:

#	Issue	Status
1	Tautological comparison (`resumable-stream.ts:284`)	Fixed -- compares `streamId` param
2	`pr.md` removal	Confirmed deleted
3	`_streamCompletionPromise` on error	Fixed -- `finally` block at `index.ts:1638`
4	`body` lost during tool continuations	Fixed -- `_lastBody` at `index.ts:518`
5	`string.length` vs byte count	Fixed -- `textEncoder.encode().byteLength`
6	Abort signal `completed` guard	Present at `ws-chat-transport.ts:112`
7	`clearAll()` clears chunk buffer	Fixed at `resumable-stream.ts:347`
8	Errored streams cleaned up	`status in ('completed', 'error')` at `resumable-stream.ts:380`
9	`reasoning-delta` fallback	Fixed at `message-builder.ts:117-121`
10	`text-delta` fallback `state: "streaming"`	Fixed at `message-builder.ts:89`
11	Tool continuation unhandled rejection	Fixed -- `return this._tryCatchChat(...)` at `index.ts:502`, `.catch()` at `index.ts:542`
12	Cached `TextEncoder`	Module-level in both `index.ts:33` and `resumable-stream.ts:28`
13	Observability gated on success	Guarded by `streamCompleted.value` at `index.ts:1655`
14	E2E test naming	`"LLM e2e (Workers AI)"`
15	Changeset versions	`@cloudflare/ai-chat: minor`, `agents: patch` -- correct

Build and tests

Build: all 4 packages build cleanly
Workers tests: 136/136 pass (20 test files)
React/e2e tests: can't run locally (need Playwright/Chromium), will pass in CI

Remaining low-severity items (not blocking)

...rest spread at react.tsx:591 -- creates a new object identity every render passed to useChat(). Whether this causes problems depends on the AI SDK's internal handling of options identity. The infinite loop was actually caused by the agent.stub proxy (fixed in packages/agents/src/react.tsx:638 with useMemo). The ...rest is a latent risk but not a confirmed bug -- worth stabilizing in a follow-up by destructuring only the specific pass-through props.
N+1 deletes in _enforceMaxPersistedMessages (index.ts:1065-1068) -- individual DELETE per row. Fine for typical excess of 1-5 messages. Acknowledged as deferred.
setTimeout(resolve, 500) fallback (index.ts:493) -- pragmatic workaround with TODO comment. Not blocking.
Message ID uses Date.now() + Math.random() (index.ts:1586) instead of nanoid() -- minor inconsistency since the start chunk can override it.
Abort signal listener not removed on completion (ws-chat-transport.ts:111) -- the completed flag prevents functional impact. Adding { signal: abortController.signal } to the listener would auto-clean it, but this is cosmetic.

Verdict

This PR is ready to land. All high and medium severity issues from 4 review rounds are fixed. The code is sound, the architecture extraction is clean, the bug fixes are real and well-tested, and the 136 Workers tests pass. The remaining items are all low-severity nits suitable for follow-up work.

github run

Update package-lock.json to add "peer": true to numerous dependency entries, remove/adjust a few misplaced peer flags, and fix minor license formatting. This standardizes peer metadata across the lockfile to ensure consistent dependency metadata.

threepointone requested review from deathbyknowledge and whoiskatrin February 13, 2026 00:24

threepointone changed the title ~~ai-chat: architecture cleanup, 6 bug fixes, storage guards, new features, docs rewrite~~ DO NOT LAND: ai-chat: architecture cleanup, 6 bug fixes, storage guards, new features, docs rewrite Feb 13, 2026

threepointone marked this pull request as draft February 13, 2026 00:25

threepointone force-pushed the ai-chat-cleanup branch from 96eaea5 to a278f92 Compare February 13, 2026 00:31